Synchronizing audio signal sampling in a wireless, digital audio conferencing system

ABSTRACT

A digital audio conferencing system has a fixed base station that is in communication with a far end (R.E.) system over a communication network. The base station is associated with a wireless loudspeaker and one or more wireless microphones. The base station operates to receive F.E. audio signals to be played by the wireless loudspeaker, and it operates to remove acoustic echo picked up by the wireless microphones. A first clock controlling F.E. audio signal sampling at the base station, and a second clock controlling audio signal sampling and at a wireless microphone are synchronized to one master, reference clock that controls the operation of the base station. Acoustic echo included in an audio signal picked up by a wireless microphone is removed by AEC functionality running in the base station.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuing application filed under 37 CFR 1.53(b)and claims the benefit under 35 U.S.C. 120 of U.S. patent applicationSer. No. 13/541,148, entitled “SYNCRONIZING AUDIO SIGNAL SAMPLING IN AWIRELESS DIGITAL AUDIO CONFERENCING SYSTEM”, filed Jul. 3, 2012, theentire contents of which are incorporated herein by reference.

FIELD OF THE INVENTION

The present disclosure relates generally to audio conferencing systemshaving wireless microphones and wireless speakers.

BACKGROUND

Audio conferencing systems typically have a speaker, for playing audiogenerated by a far end audio signal source, and one or more microphonesfor capturing audio information generated by a near end or local audiosignal source. As a consequence of the proximity of the speaker to oneor more microphones in the system, at least some of the acoustic signalenergy in the far end audio played by the speaker can be picked up bythe microphones and sent back to the far end where it can be played andheard as acoustic echo. This acoustic echo can be very disruptive duringthe course of a conversation, as speakers at the far ends may have towait for the echo to subside before speaking again.

In order to mitigate the disruptive effects of acoustic echo in an audioconferencing system, acoustic echo cancellation (AEC) arrangements existthat have the effect of removing a large portion of the acoustic echocomponent in the local microphone signal before it is sent to the farend. FIG. 1 shows a typical prior art audio conferencing system 10having a loudspeaker 12 and a microphone 13 both of which are hard wiredto the system. The system 10 also includes a digital signal processor(14) that includes, among other things, an adaptive filter 16 and asummation function 15. Generally, the audio conferencing system 10operates as follows to cancel acoustic echo. An acoustic signal 11generated by a far end (F.E.) audio source is received by the localaudio conferencing system 10 (near end) which sends it to a loudspeaker12 for reproduction. The far end acoustic signal is also sent to a DSP14 that includes an adaptive filter 16 which is programmed to calculatean estimate of room echo (reference signal) 17. Typically, some energyfrom the acoustic signal 11 reproduced by the loudspeaker 12 is pickedup by a microphone 13 (along with any local acoustic signal) and is sentto a summation function 15 operating in the DSP which subtracts thecalculated estimate of the room echo (reference signal) 17 from amicrophone signal 18 to product an echo cancelled signal 19 that is sentto the far end. This echo cancelled signal 19 is also sent to the DSP 14which uses it to train the adaptive filter 16.

In order for the summation function to cancel the acoustic echo, boththe reference signal and the microphone signal that includes the echo tobe cancelled are processed by the summation function 15 at substantiallythe same time (aligned), otherwise some or all of the local room echowill not be cancelled. There is a round-trip delay from the time whenthe F.E. signal 11 is reproduced by the loudspeaker 12, the acousticecho is picked up by the microphone (along with the local acousticsignal) and the acoustic signal 18 is sent to the DSP 14. This delay canbe determined using empirical methods and can be programmed into thesystem 10 and is used to determine when in time the reference signal 17is subtracted from the microphone signal 18.

As all of the signal processing typically takes place in a single DSP,DSP 14 in this case, the timing associated with the subtraction of thereference signal from the microphone signal cancelling the acoustic echocan be easily controlled. Specifically, since sampling of the F.E.acoustic signal 11 and the local microphone signal 18 takes place in asingle device, DSP 14, the timing relationships between these signalsare known.

DESCRIPTION OF THE RELATED ART

United States patent application publication no. 2004/0131201 entitled“Multiple Wireless Microphone Speakerphone System and Method” describesa speakerphone arrangement with wireless microphones that includes ananalog audio speech processing unit 206 and that has a speaker 122hard-wired to a speakerphone pod. Each wireless microphone 102 a to 102n can transmit an audio signal over a separate frequency or during aseparate timeslot to one of several receivers 1-N each of which isdedicated to a single wireless microphone. Associated with each receiveris a separate audio processing and AEC unit 114 a to 114 n which operateto remove acoustically coupled speaker signals from the microphonesignals. While it is convenient to provide wireless microphones with aspeakerphone system, prior art speakerphones with wireless microphonesemploy analog transmission techniques to send and receive the microphonesignals, and such analog transmission techniques do not efficientlyutilize the available wireless spectrum.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of a wired audio conference phone.

FIG. 2A is a diagram of an embodiment of a digital audio conferencephone having wireless microphones and speakers.

FIG. 2B is a diagram of an embodiment of a multimedia conference system.

FIG. 3 is a diagram showing an audio conferencing system base station.

FIG. 4 is the base station FIG. 3 in greater detail.

FIG. 5 is an illustration of a waveform sampled at a DSP and at awireless microphone with synchronized sampling clocks.

FIG. 6 is an illustration of a waveform sampled at a DSP and wirelessmicrophone with unsynchronized clocks.

DETAILED DESCRIPTION

Wireless digital communications protocols, such as DECT, and a variationreferred to as Personal Wireless Communication (PWC), use an acousticaudio signal coding technique, such as the Constrained Energy LappedTransform (CELT) or the Adaptive Differential Pulse Code Modulation(ADPCM) techniques, that permit the transmission of more informationover a wireless medium in the same amount of time. Using either of thesetechniques to code an audio signal results in the compression of theaudio signal without audible loss in audio fidelity or with anacceptable loss of audio fidelity. As a consequence of the advantages ofdigital, wireless communication protocols, many of the analog techniquesfor the wireless transmission of information are becoming obsolete. Inthis regard, an audio conferencing system having wireless microphonesand a base station is designed such that the microphones and the basestation transmit and receive acoustic audio information using a wirelessdigital communication protocol, such as the DECT protocol. Each of themicrophones and the base station in the conferencing system have atransceiver device (radio) that operates to compress a digitizedacoustic audio signal (or simply audio signal) for transmission or todecompress a received, compressed audio signal.

As with a wireless, analog audio conferencing system, it is necessary toremove acoustical echo received by each of the wireless microphones in adigital, wireless audio conferencing system. In order to removeacoustical echo from a signal before sending it to a far end system, itis necessary to sample and process an audio signal received from theF.E. system for reproduction by a local loudspeaker and it is necessaryto sample and process a local microphone signal. The audio signal issampled a first time by a first signal sampling device located in a basestation (being driven by a first clock local to the base station) afterit is received from a F.E. source, and it is sampled a second time (echocomponent) by a second signal sampling device located in a wirelessmicrophone (being driven by a second clock local to the wirelessmicrophone) after it is picked up by the wireless microphone. In orderfor the AEC to remove some or most of the echo component in the localmicrophone signal, the base station includes functionality that alignsthe first audio signal samples and the second audio signal samples intime, and one hundred eighty degrees out of phase. However, if the firstand second clocks driving the sampling functionality at the base stationand at the wireless microphones respectively are not synchronized (dueto differential clock drift between the first and second clocks, forinstance) some of the audio information in the first audio sample can bedifferent than the audio information in the second audio sample, andconsequently it can be very difficult or not possible to align the audioinformation in time in order to perform the AEC operation.

It was discovered that the different clocks driving the samplingfunctionality at the base station and at the microphone can be derivedfrom a single reference clock that drives the digital wirelesstransmission functionality (radio) in the base station. Since the basestation (fixed part or F.P.) reference clock is the master clock towhich each mobile device or mobile part (M.P.) radio reference clockslaves, all of the reference clocks in each part can be synchronized tothe base station reference, and since each clock driving the samplingfunctionality at the F.P. and the M.P. is derived from a reference thatis synchronized to the master reference, the base station is able toalign in time audio information in the first audio signal samplesrelating to an echo estimate with corresponding acoustic echoinformation in the second audio signal samples to cancel the acousticecho component included in the local microphone signal.

An embodiment of a digital conferencing system 20A is shown in FIG. 2Ahaving a base station 21, a plurality of wireless microphones,MIC.0-MIC.n, and a plurality of wireless loudspeaker 23. The basestation 21 can be referred to as a fixed part (F.P.) and each of thewireless microphones and the wireless loudspeaker can be referred to asa mobile part (M.P.) The base station or F.P. 21 is, in this case,connected over a link that is hard-wired to a communications network 22,such as a POTS network or an IP Internet Network, and generally operatesto receive audio information from a far end (F.E.) audio source over thenetwork 22, and it operates to transmit audio information picked up byany of the local microphones, MIC.0-MIC.n to the F.E. over the network22. The base station 21 also operates to digitize and compress the audioinformation received from the F.E. and transmit a digitized, compressedaudio signal over the air to the wireless loudspeaker 23, where theaudio signal is decompressed and converted to an analog audio signal forreproduction by the loudspeaker. Each of the wireless microphones,MIC.0-MIC.n, operate to pickup acoustic audio information from a source,such as an individual speaker who is proximate to the microphone,convert the analog audio signal into a digital audio signal, compressthe digitized audio signal and send the audio signal over a medium (inthis case the air) to the base station 21 where the signal isdecompressed.

Continuing to refer to FIG. 2A, in addition to digitizing andcompressing the audio signal received from the F.E., the base station 21also includes functionality that operates to remove/cancel acousticalecho from audio signals received from any of the microphones,MIC.0-MIC.n. This acoustical echo cancellation (AEC) functionality canbe implemented in one or more digital signal processing (DSP) devicessuch as DSP 24 included in the base station 21. In operation, the AECfunctionality comprising DSP 24 uses a F.E. audio signal and a localecho cancelled audio signal to calculate an estimate of an acousticalecho, and then remove this estimate of the acoustical echo from a localmicrophone signal before the conferencing system 20A sends the localaudio signal over the network 22 to the F.E. More specifically the DSP24 receives a F.E. audio signal 25 and an echo cancelled signal 26 asinputs which it uses to calculate an estimate of acoustical echo pickedup by any one or more of the local microphones, MIC.0-MIC.n). The DSP 24also receives an audio signal 27 from any one or more of the localmicrophones and subtracts the echo estimate from this microphone signal.

Also shown in FIG. 2A is a master reference clock 28A generated by thebase station 21 digital radio, which is used to derive a sampling clockfor the DSP 24. The reference clock 28A is the clock used to control theoperating frequency (carrier frequency) of the radio transmissions tothe wireless speaker with which the base station is associated. As canbe seen in FIG. 2A, the reference clock 28A is divided down from anominal carrier frequency to, in one embodiment, a 52 KHZ clock rate.The DSP 24 then uses this 52 KHZ clock to control the rate at which itsamples the F.E. audio signal input to the DSP and which is typicallyreferred to as a reference signal. Similar to the base station, each ofthe digital radio's comprising the local wireless microphones,MIC.0-MIC.n, in FIG. 2A and the wireless speaker also generate areference clock (28B and 28C respectively) that is synchronized to themaster reference clock in the base station 21. This reference clockgenerated by the radio in each microphone is divided from the nominalcarrier frequency to a frequency that can be used to sample anacoustical audio signal received at each of the microphones, which inthis case is the same clock rate, 52 HKZ, as used by the DSP 24 in thebase station. As will be described in greater detail later withreference to FIG. 5, synchronizing the clocks that are used to controlthe sampling in the base station and in the microphones ensures that theaudio information captured at each location can be aligned during theAEC process.

In another embodiment, a multimedia conferencing system 20B is connectedto a network, such as an internetwork or a local network, and receivesand/or transmits multimedia information from and/or to anothermultimedia conferencing system also connect to the network. Themultimedia (MM) conferencing system 20B is very similar to the digitalwireless conference system 20A described with reference to FIG. 2A, withthe primary difference being that the microphones and speakers arehard-wired to the system as opposed to communicating with a base stationover a wireless medium. According to the embodiment of FIG. 2B, theconference system 20B is comprised of a processing device, at least oneloudspeaker, one or more microphones and at least one camera all ofwhich can be hard-wired to the processing device. The processing deviceis comprised of a digital signal processor (DSPO which is programmed to,among other things, run an audio application which operates to removeacoustical echo from a local signal that is sent back over the networkto another, far end, MM system. The system 20B can receive a MM signal,which can have an audio and a video component, over the network fromanother MM system (or non-MM system, such as an audio conferencingsystem) connected to the network, and it can receive a network masterclock signal from the network. The network master clock signal can begenerated by a master clock generation device that is running externalto the network (and connected to the network) or which is runninginternal to the network. The method for distributing the master clocksignal throughout the network will not be described here as techniquefor such distribution are well know. The master clock signal is used bythe network to drive the operation of certain equipment connected to thenetwork, such as the MM conferencing device 20B, or any other devicethat can operate under the control of the network master clock. The MMsignal received by the system 20B is sent to a loudspeaker connected tothe system and it is sent to a DSP which is programmed to, among otherthings, remove acoustic echo from the local audio signal sent over thenetwork to another MM system. The master clock signal received by thesystem 20B is sent to a clock derivation function comprising theprocessing device which operates to derive a local clock that the can beemployed by the DSP to drive functionality associated with an echocancellation application operating in the DSP. In this case, the clockderivation functionality modifies (divides) the a master clock rate tobe a clock rate that a local clock can use to drive the audio processingapplication running on the processing device. While the clock derivationfunction described above operates to divide the master clock rate downto a rate used by the local clock, it should be understood that themaster clock rate can be multiplied to increase the clock rate, or someother more complex derivation function can be used to derive the localclock rate.

FIG. 3 is a diagram showing the base station 21 having AEC functionality31 and a digital radio. The AEC 31 can be implemented in the DSP 24 ofFIG. 2, and the operation of the AEC 31 is generally well known to audioengineers and so will not be described here in great detail. As shown inFIG. 3, the AEC 31 receives and samples at sampling functionality 32 anaudio signal received from a F.E. source. The sampled signal is used asa reference signal at the input to an adaptive filter 33. The audioinformation comprising the reference signal and an echo cancelled signalis used by the adaptive filter 33 to calculate an estimate of acousticalecho picked up by any one of the local microphones, MIC.0-MIC.n. Thisestimate of acoustical echo is subtracted from a local microphone signalby a summation function 34, the result of which is an echo cancelledaudio signal 36 which is sent to both the F.E. and applied to theadaptive filter 33 as discussed above. Note that in the embodiment ofFIG. 3, the clock used to control the sample rate is derived from themaster clock generated by the radio in the base station.

FIG. 4 is a diagram illustrating much of the same functionalitydescribed with reference to FIG. 3, however in this Figure the F.E.audio signal used as input to a AEC 42 sampling function 44 is firstcompressed in the Radio. As described earlier, the coding techniquecurrently employed by DECT devices is the adaptive differential pulsecode modulation (ADPCM) technique (or alternatively CELT), and usingthis technique to code an audio signal results in thecompression/decompression of the audio signal with audible loss in audiofidelity. Specifically, a F.E. audio signal received at the base station40 is compressed for transmission over the medium to a wireless speaker,such as the wireless speaker 23 in FIG. 2. The audio signal received bythe speaker 23 is then decompressed and played. Some of the acousticsignal played by the loudspeaker can be picked up by a local microphonewhere it is compressed and transmitted to the base station anddecompressed. While the fidelity of the audio signal is not appreciablydenigrated by the cycle of compression and decompression, audio signalinformation is lost as the result of this compression/decompressioncycle which can make it difficult for the AEC 42 to converge toeffectively cancel acoustic echo (because the reference signal includesdifferent audio information than the microphone signal). In order tocompensate for the microphone signal compression, a decompressionfunction 43 is included in the AEC 42 that compensates for the loss ofaudio signal information resulting from the decompression of the speakersignal. The compensation method is described in detail inPCT/US2012/026147 the contents of which have been incorporated byreference in this description.

In operation, the radio in base station 40 receives an audio signal froma F.E. audio source and compresses the audio signal for transmission toa wireless speaker. The compressed F.E. audio signal is also sent to theAEC 42, where is it first decompressed by the decompression function 43and then sampled by sampling function 44. The decompressed and sampledaudio signal is used as a reference signal at the input to an adaptivefilter 45 which operates to calculate, using an echo cancelled audiosignal from a summation function 46, an estimate of an acoustical echo.This acoustical echo estimate is one input to the summation function 46,and a local microphone signal is a second input to the summationfunction. The summation function then subtracts the estimated echo fromthe local microphone signal and the base station sends the localmicrophone signal to the F.E. with the echo removed. More specifically,a local microphone, such as the wireless microphone MIC.0, receivesacoustic audio information from one or more local acoustical sources,such as a speaker and a loudspeaker. The component of the acoustic audiopicked up from the loudspeaker is the acoustic echo component to becancelled. The local microphone signal is then compressed, by a digitalradio, for transmission to the base station where the signal isdecompressed and sent to the AEC 42. As with the base station 21described with reference to FIG. 3, the sampling functionality 44 iscontrolled by a clock that is derived from the master reference clockgenerated by the digital radio.

As mentioned earlier with reference to FIG. 2, synchronizing the clockthat is used to control the sampling at the base station of a F.E. audiosignal destined for a wireless speaker, with each of the clocks used tosample an acoustic signal at the local microphones ensures that theaudio information captured at each location can be aligned by AEC 42,which results in the maximum cancellation of acoustic echo. FIG. 5illustrates how synchronizing the clock controlling the sampling ofaudio information at the DSP with each clock controlling the sampling ofaudio information at the wireless microphones permits corresponding oraligned DSP and the microphone samples to include similar audioinformation which allows the AEC 42 to align the sampled audioinformation in order to cancel any acoustic echo component included in amicrophone signal. The term “similar”, according to this description,means that the acoustic signal played by the wireless loudspeaker ismodified according to the acoustical properties of the room in which theconferencing system is operating before being picked up by a microphone,so the audio signal/information picked up by the microphone is notexactly the same as the audio signal/information played by theloudspeaker, but the signal played by the loudspeaker in modified form.FIG. 5 shows an audio signal 50 received from a F.E. source destined fora wireless speaker, such as loudspeaker 23 in FIG. 2, and an audiosignal 51 picked up by any one of the microphones, MIC.0-MIC.n in FIG.2. Although the microphones pick up local audio information that iscomprised of acoustic audio information generated by a local source,such as a speaker, and acoustic audio information from loudspeaker 23,for the purpose of this description the audio signal 51 only representsa waveform that includes the acoustic echo component picked up by themicrophone from the loudspeaker 23. A first sample (S.1) of the audioinformation (audio information A) comprising signal 50 is captured on afirst clock edge at time T.1, a second sample (S.2) of audio informationcomprising signal 50 is captured at time T.2, a third sample (S.3) iscaptured at time T.3 and a forth sample (S.4) is captured at time T.4.and so forth. Each of the samples includes a plurality of bits (8, 10,12 or more) which together can represent a characteristic of the audiosignal 50, such as amplitude, at the point in time the sample iscaptured. The position in time of the clock edges used by the DSP tostart the capture each sample is controlled by a sampling clock which inone embodiment is derived from and synchronized to the master referenceclock generated by the radio on the base station 21.

Continuing to refer to FIG. 5, the acoustic audio signal 51 picked up byone of the local microphones is sampled at the microphone as follows. Afirst sample of audio information (S.1′) comprising signal 51 iscaptured on a first clock edge at time T.1, a second sample of audioinformation (S.2′) is captured on a second clock edge at time T.2, athird sample of audio information (S.3′) is captured on a third clockedge at time T.3, a forth sample of audio information is captured on aforth clock edge at time T.4 and a fifth sample of audio information iscaptured on a fifth clock edge at time T.5, and so forth. Each of thesamples includes a plurality of bits which together represent a valueindicative of the audio information captured from the signal 51. Theposition in time of the clock edges used by the sampling function on themicrophone to start the capture of audio information associated witheach sample is controlled by the sampling clock derived from andsynchronized to, as described earlier with reference to FIG. 2, themaster reference clock generated by the radio on the base station 21.

An examination of the two audio signals 50 and 51 in FIG. 5 shows thatthere is a time delay of one sampling clock cycle from the time theaudio information captured in sample S.1 is picked up and captured insample S.2′ by the microphone. The exact value of this delay can bedetermined empirically and programmed into the AEC functionality of DSP24. Knowing the value of this delay, the DSP 24 (more specifically, theAEC 42 in FIG. 4) can operate to align the audio information capturedfrom the loudspeaker signal 50 with the audio information captured fromthe microphone signal 51. More specifically, it can be seen in FIG. 5that the audio information captured in sample S.1 in signal 50 startingat time T.1 corresponds to the audio information captured in sample S.2′in signal 51 starting at time T.2. In this case the signal propagationdelay is one sampling clock cycle (but the delay can be more or lessthan one cycle), and this delay can be programmed into the AECfunctionality 42 as described above.

As illustrated in FIG. 6, if the clocks used to control the signalsampling at a DSP located in a base station (F.P.) and at the wirelessmicrophones (M.P.) are not synchronized to a common, master referenceclock, then the audio information captured in samples at the DSP andmicrophones can be different resulting in the AEC not removing theacoustic echo or removing as much of the acoustic echo component in themicrophone signal than is possible using the method of this invention.Specifically, FIG. 6 shows that the time at which each of the samplesare captured from signal 51 is offset in time (captured later in time)from the time at which each of the samples are captured from signal 50.In this case, when the samples captured from signal 50 are aligned withthe samples in signal 51 (similarly to the alignment described earlierwith reference to FIG. 5 . . . sample S.1 aligned to sample S.2′ and soforth) the audio information in the signal 50 samples will not be thesame as the audio information in the signal 51 samples. Thismisalignment of the audio information will prevent the AEC 42 fromremoving some or all of the acoustic echo in the local microphonesignal.

The forgoing description, for purposes of explanation, used specificnomenclature to provide a thorough understanding of the invention.However, it will be apparent to one skilled in the art that specificdetails are not required in order to practice the invention. Thus, theforgoing descriptions of specific embodiments of the invention arepresented for purposes of illustration and description. They are notintended to be exhaustive or to limit the invention to the precise formsdisclosed; obviously, many modifications and variations are possible inview of the above teachings. The embodiments were chosen and describedin order to best explain the principles of the invention and itspractical applications, they thereby enable others skilled in the art tobest utilize the invention and various embodiments with variousmodifications as are suited to the particular use contemplated. It isintended that the following claims and their equivalents define thescope of the invention.

I claim:
 1. A base station comprising a wireless, digital audioconferencing system operates to derive a local clock from a master clocksignal generated by an external network to which it is connected, amethod comprising: receiving the master clock signal at the basestation, the master clock signal having a first clock rate; deriving thelocal clock by the base station modifying the first clock rate of themaster clock signal to be a second clock rate, the second clock ratebeing different than the first clock rate; and using the second clockrate to control the operation of the base station to capture and processmultimedia information.
 2. The method of claim 1, wherein the basestation runs a multimedia processing application that is one of an audioconference application or a video conference application.
 3. The methodof claim 2, wherein the audio conference application is comprised offunctionality for capturing acoustic audio information and removingacoustic echo from the captured audio information.
 4. The method ofclaim 1, wherein the multimedia information is comprised of one or bothof audio and video information.
 5. The method of claim 1, wherein thesecond clock rate is derived by dividing the first clock rate comprisingthe master clock signal.